Instruction Coalescing for 16-bit Code
نویسندگان
چکیده
In the embedded domain, memory usage and energy consumption are critical constraints. Embedded processors such as the ARM and MIPS provide a 16-bit instruction set (called Thumb in the case of the ARM cpu family) in addition to the 32-bit instruction set to address these concerns. Using 16-bit instructions one can achieve code size reduction and I-cache energy savings at the cost of performance. This paper presents a novel approach that enhances the performance of 16-bit Thumb code. We have observed that throughout Thumb code there exist Thumb instruction pairs that are equivalent to a single ARM instruction. We have developed enhancements to the processor microarchitecture and the Thumb instruction set to exploit this property. We enhance the Thumb instruction set by incorporating Augmenting eXtensions (AX). A Thumb instruction pair that can be combined into a single ARM instruction is replaced by an AXThumb instruction pair by the compiler. The AX instruction is coalesced with the immediately following Thumb instruction to generate a single ARM instruction at decode time. The enhanced microarchitecture ensures that coalescing does not introduce pipeline delays or increase cycle time thereby resulting in reduction of both instruction counts and cycle counts. Using AX instructions and coalescing hardware we are also able to support efficient predicated execution in 16 bit mode.
منابع مشابه
A compact code 16-bit processor for embedded applications
This work proposed an instruction set that achieved small executable codes for embedded applications. The aim of the design is to reduce the size of the executable code while maintaining the execution speed. Rather than applying instruction compression which required complex additional circuits, the approach taken in this work is to design the instruction set for the purpose of compact code. Th...
متن کاملInstruction Sets Mixed - Width
A pplications written for the embedded domain must perform under the constraints of limited memory and limited energy. While these constraints have always existed, current trends, such as mobile computing and ubiquitous computing, bring more and more complex applications to the embedded domain, making performance, or speed of execution, an important factor as well. For instance, we are now able...
متن کاملQuantitative approach to ISA design and compilation for code size reduction
In this paper, an efficient code size optimization instruction set architecture targeting embedded telecommunication applications is introduced. Nowadays, mixed 16-bit and 32bit size instruction set approaches are commonly used to achieve code size reduction while minimizing performance loss. They are usually designed with some restrictions such as reducing the number of accessible registers, m...
متن کاملIntegrated Register Allocation and Instruction Scheduling with Constraint Programming
This dissertation proposes a combinatorial model, program representations, and constraint solving techniques for integrated register allocation and instruction scheduling in compiler back-ends. In contrast to traditional compilers based on heuristics, the proposed approach generates potentially optimal code by considering all trade-offs between interdependent decisions as a single optimization ...
متن کاملInvestigating the Potential of Custom Instruction Set Extensions for SHA-3 Candidates on a 16-bit Microcontroller Architecture
In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA-3 candidates. To this end, we start from optimized assembly code for a common 16-bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16-bit architectures, commonly used in embedded systems. F...
متن کامل